-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster strpos() string function for ASCII-only case #12401
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great 🚀 The implementation looks good to me
However, I'm not really sure why the normal string array can't be improved. 🤔
I think it's not executed and triggered some error, see benchmark StringArray
cases only take several ns, and StringViewArray
cases are at least several us
I guess it's because in micro bench, arguments are of different physical string type and break the function implementation (strpos(string_col, string_view_col)
)
With #12415 this kind of bug can be more throughly tested
Update: It's not a bug for SQL API, and this can only be triggered if we use raw expression API as in the benchmark
{ | ||
let string_iter = ArrayIter::new(string_array); | ||
let substring_iter = ArrayIter::new(substring_array); | ||
let ascii_only = string_array.is_ascii() && substring_array.is_ascii(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let ascii_only = string_array.is_ascii() && substring_array.is_ascii(); | |
let ascii_only = substring_array.is_ascii() && string_array.is_ascii() ; |
I think we can check substring first, since it's cheaper in common cases and we can then shortcircuit checking string_array
in some case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion! I did the benchmark again. The performance is improved!
group before after
----- ----- -------
strpos_StringArray_ascii_str_len_128 1.06 370.5±9.46ns ? ?/sec 1.00 348.6±11.38ns ? ?/sec
strpos_StringArray_ascii_str_len_32 1.07 372.5±10.23ns ? ?/sec 1.00 346.9±9.45ns ? ?/sec
strpos_StringArray_ascii_str_len_4096 1.08 378.4±12.05ns ? ?/sec 1.00 349.4±16.83ns ? ?/sec
strpos_StringArray_ascii_str_len_8 1.07 371.4±14.53ns ? ?/sec 1.00 346.1±28.44ns ? ?/sec
strpos_StringArray_utf8_str_len_128 1.06 377.2±18.04ns ? ?/sec 1.00 356.4±21.92ns ? ?/sec
strpos_StringArray_utf8_str_len_32 1.08 374.9±34.32ns ? ?/sec 1.00 345.8±11.05ns ? ?/sec
strpos_StringArray_utf8_str_len_4096 1.09 381.6±16.68ns ? ?/sec 1.00 351.4±23.14ns ? ?/sec
strpos_StringArray_utf8_str_len_8 1.09 372.9±20.83ns ? ?/sec 1.00 343.3±11.78ns ? ?/sec
strpos_StringViewArray_ascii_str_len_128 1.79 3.2±0.15ms ? ?/sec 1.00 1763.8±44.16µs ? ?/sec
strpos_StringViewArray_ascii_str_len_32 1.03 648.7±18.27µs ? ?/sec 1.00 628.1±21.69µs ? ?/sec
strpos_StringViewArray_ascii_str_len_4096 1.24 62.8±7.27ms ? ?/sec 1.00 50.5±2.05ms ? ?/sec
strpos_StringViewArray_ascii_str_len_8 1.00 280.2±10.44µs ? ?/sec 1.05 294.8±42.47µs ? ?/sec
strpos_StringViewArray_utf8_str_len_128 1.03 5.3±0.13ms ? ?/sec 1.00 5.1±0.12ms ? ?/sec
strpos_StringViewArray_utf8_str_len_32 1.01 1961.9±57.34µs ? ?/sec 1.00 1944.1±73.19µs ? ?/sec
strpos_StringViewArray_utf8_str_len_4096 1.03 147.4±9.24ms ? ?/sec 1.00 142.5±3.61ms ? ?/sec
strpos_StringViewArray_utf8_str_len_8 1.01 874.5±26.02µs ? ?/sec 1.00 863.4±40.68µs ? ?/sec
You're right. There are some errors here. I ran the benchmark in the
It seems that Then, I realized it was my mistake 😢. When I generated the substring for the |
5aae0fb
to
62f2307
Compare
] | ||
} else { | ||
let string_array: StringArray = output_string_vec.clone().into_iter().collect(); | ||
let sub_string_array: StringArray = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebase to make all benchmark-related changes in the first commit. It was
let sub_string_array: StringViewArray = output_string_vec.clone().into_iter().collect();
After changing to StringArray
, the benchmark works fine.
@2010YOUY01 Thank you for the review. I updated the benchmark result. After fixing the benchmark, we can find the StringArray case also be improved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb @2010YOUY01 |
Which issue does this PR close?
Closes #12366.
Rationale for this change
For ASCII-only cases, I tried to use the slice window to find the position instead of the string pattern matching.
We can find that the ASCII-only
StringViewArray
case performs 50% faster than before.However, I'm not really sure why the normal string array can't be improved. 🤔What changes are included in this PR?
If both arguments are ASCII-only, we can perform this faster method.
The benchmark result:
Are these changes tested?
yes
Are there any user-facing changes?